Developing Simplified Chinese Psychological Linguistic Analysis Dictionary for Microblog

نویسندگان

  • Rui Gao
  • Bibo Hao
  • He Li
  • Yusong Gao
  • Tingshao Zhu
چکیده

The words that people use could reveal their emotional states, intentions, thinking styles, individual differences, etc. LIWC (Linguistic Inquiry and Word Count) has been widely used for psychological text analysis, and its dictionary is the core. The Traditional Chinese version of LIWC dictionary has been released, which is a translation of LIWC English dictionary. However, Simplified Chinese which is the world's most widely used language has subtle differences with Traditional Chinese. Furthermore, both English LIWC dictionary and Traditional Chinese version dictionary were both developed for relatively formal text. Microblog has become more and more popular in China nowadays. Original LIWC dictionaries take less consideration on microblog popular words, which makes it less applicable for text analysis on microblog. In this study, a Simplified Chinese LIWC dictionary is established according to LIWC categories. After translating Traditional Chinese dictionary into Simplified Chinese, five thousand words most frequently used in microblog are added into the dictionary. Four graduate students of psychology rated whether each word belonged in a category. The reliability and validity of Simplified Chinese LIWC dictionary were tested by these four judges. This new dictionary could contribute to all the text analysis on microblog in future.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Detecting public sentiment over PM2.5 pollution hazards through analysis of Chinese microblog

Decision-making in crisis management can benefit from routine monitoring of the (social) media to discover the mass opinion on highly sensitive crisis events. We present an experiment that analyzes Chinese microblog data (extracted from Weibo.cn) to measure sentiment strength and its change in relation to the recent PM 2.5 air pollution events. The data were analyzed using SentiStrength algorit...

متن کامل

Gender Concept “Woman” in the Minds of the Russian People (Taking the Chinese as Reference) According to an Associative Experiment

The article is devoted to the study of language representations of the concept of “woman” in the minds of the Russian and Chinese people based on a comparison of associative experiments of two languages, identifying the dynamics of the concept in the language consciousness of the people, establishing the specificity of the concept in the Russian language picture of the world referring to the Ch...

متن کامل

Using Linguistic Features to Estimate Suicide Probability of Chinese Microblog Users

If people with high risk of suicide can be identified through social media like microblog, it is possible to implement an active intervention system to save their lives. Based on this motivation, the current study administered the Suicide Probability Scale(SPS) to 1041 weibo users at Sina Weibo, which is a leading microblog service provider in China. Two NLP (Natural Language Processing) method...

متن کامل

How did the Suicide Act and Speak Differently Online? Behavioral and Linguistic Features of China's Suicide Microblog Users

Background: Suicide issue is of great concern in China. Social media provides an active approach to understanding suicide individuals in terms of their behavior and language use. Aims: This study investigates how suicide Microblog users in China act and speak differently on social media from others. Methods: Hypothesis testing in behavioral and linguistic features was performed between a target...

متن کامل

A Comparison between Microblog Corpus and Balanced Corpus from Linguistic and Sentimental Perspectives

While microblogging has gained popularity on the Internet, analyzing and processing short messages has become a challenging task in natural language processing. This paper analyzes the differences between Internet short messages (or “microtext”) and general articles by comparing the Plurk Corpus and the Sinica Balanced Corpus. Likelihood ratio and the tóngyìcícílín (“ ”) thesaurus are adopted t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013